[#11625][fix] handle list-typed eos_token_id in SamplingParams._setup by edenfunf · Pull Request #13178 · NVIDIA/TensorRT-LLM

edenfunf · 2026-04-18T15:54:32Z

What

SamplingParams._setup() crashed trtllm-serve with std::bad_cast whenever
the tokenizer's eos_token_id is a list instead of a scalar int.

Before:

# sampling_params.py – _setup()
self.end_id = tokenizer.eos_token_id   # could be a list → C++ crash

After:

eos_token_id = tokenizer.eos_token_id
if isinstance(eos_token_id, list):
    self.end_id = eos_token_id[0] if eos_token_id else None
    # register remaining tokens so generation still halts on all of them
    for tok in eos_token_id[1:]:
        if tok not in (self.stop_token_ids or []):
            self.stop_token_ids = (self.stop_token_ids or []) + [tok]
else:
    self.end_id = eos_token_id

Why

HuggingFace tokenizers for models like Llama 3.1 expose
eos_token_id = [128001, 128009] (a List[int]).
_setup() blindly assigned this list to self.end_id, which is typed
Optional[int]. Downstream, base_worker.py passes end_id directly into
tllm.Request(end_id=...) — a C++ nanobind binding that expects int32_t.
PyBind/nanobind cannot cast a Python list to int32_t, so it raises
std::bad_cast, killing the server for every inference request on
any model whose tokenizer advertises multiple EOS tokens.

Why is this PR open if issue #11625 is closed?

Issue #11625 was closed by a maintainer who was unable to reproduce the
crash on their own hardware ("I'm closing this issue. Please feel free to open a new one if you need~"). It was not closed because a fix was merged.
The original reporter confirmed the crash persisted after the closure.
upstream/main still contains the unfixed assignment as of this PR.

The root cause is version-dependent: newer versions of transformers
(≥ 4.43) began returning List[int] from tokenizer.eos_token_id for
models whose tokenizer_config.json has a list-valued eos_token_id.
Earlier versions always returned a scalar, which is why the bug was not
universally reproducible.

How

Read tokenizer.eos_token_id into a local variable first.
If it is a list, use eos_token_id[0] as the scalar end_id and
append eos_token_id[1:] to stop_token_ids so that generation
terminates on all configured EOS tokens, not just the first one.
If it is already a scalar, behaviour is identical to before.
The existing generation_config.eos_token_id list-normalisation logic
(lines 436-447) is untouched; de-duplication between the two paths is
handled by the existing stop_token not in self.stop_token_ids guard.

Change is minimal and isolated to the 14 new lines inside _setup().
No other call sites, no architecture changes.

Test

Before fix (reproduction):

from tensorrt_llm.sampling_params import SamplingParams

class FakeTokenizer:
    eos_token_id = [128001, 128009]  # Llama 3.1 style
    pad_token_id = 0

params = SamplingParams()
params._setup(FakeTokenizer(), None, None)
# self.end_id is now [128001, 128009] → std::bad_cast when passed to C++

After fix: params.end_id == 128001 (int), params.stop_token_ids == [128009].

Regression tests added in tests/unittest/llmapi/test_sampling_params.py
(pure Python, no GPU or model weights required):

Test	Covers
`test_setup_scalar_eos_token_id`	scalar path unchanged
`test_setup_list_eos_token_id_single_element`	single-element list → scalar
`test_setup_list_eos_token_id_multiple_elements`	multi-element list, `end_id` is `int`
`test_setup_list_eos_token_id_no_duplicate_stop_tokens`	no duplicate stop tokens
`test_setup_end_id_already_set_ignores_tokenizer_eos`	explicit `end_id` not overwritten
`test_setup_list_eos_token_id_pad_fallback`	`pad_id` fallback still an int

All pre-commit hooks passed (CRLF, ruff, ruff-format, codespell, DCO).

Summary by CodeRabbit

Bug Fixes
- Improved handling of end-of-sequence tokens to support tokenizers with multiple EOS token IDs.
Tests
- Added unit tests for SamplingParams initialization and token configuration behavior.

HuggingFace tokenizers for models like Llama 3.1 may return a list for eos_token_id (e.g. [128001, 128009]). SamplingParams._setup previously assigned this list directly to self.end_id, which is typed Optional[int]. When the C++ binding (tllm.Request) received a list instead of an int it raised std::bad_cast, crashing trtllm-serve for every inference request. Fix: when eos_token_id is a list, use the first element as end_id (scalar) and append the remaining tokens to stop_token_ids so that generation still halts on all configured EOS tokens. Fixes NVIDIA#11625 Signed-off-by: 許元豪 <146086744+edenfunf@users.noreply.github.com>

coderabbitai · 2026-04-18T15:58:26Z

📝 Walkthrough

Walkthrough

Modified SamplingParams._setup() to handle tokenizer.eos_token_id as either a single value or list. When a list is provided, end_id is set to the first element and remaining EOS tokens are appended to stop_token_ids, with duplicate prevention. Comprehensive unit tests validate the new behavior across six test scenarios.

Changes

Cohort / File(s)	Summary
SamplingParams EOS Token Handling `tensorrt_llm/sampling_params.py`	Modified `_setup()` to accept `eos_token_id` as scalar or list. Sets `end_id` to first element and appends remaining EOS tokens to `stop_token_ids` while preventing duplicates.
SamplingParams Unit Tests `tests/unittest/llmapi/test_sampling_params.py`	New test suite with 6 test cases validating scalar/list `eos_token_id` handling, duplicate avoidance, explicit `end_id` preservation, and fallback `pad_id` behavior with a mock tokenizer.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The PR description thoroughly covers What, Why, and How with detailed context, root cause analysis, code examples, test coverage table, and confirms pre-commit compliance.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: handling list-typed eos_token_id in SamplingParams._setup(). It accurately reflects the core problem and solution.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unittest/llmapi/test_sampling_params.py`:
- Line 1: Update the copyright header string "Copyright 2025 NVIDIA CORPORATION
& AFFILIATES" in this new test file to the latest meaningful modification year
(change 2025 to 2026) so the file header matches the repository policy; locate
the header line at the top of tests/unittest/llmapi/test_sampling_params.py and
replace the year accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 33a2bb26-0d45-4d5b-b5be-c307ddef63ee

📥 Commits

Reviewing files that changed from the base of the PR and between 66d7711 and 49a4774.

📒 Files selected for processing (2)

tensorrt_llm/sampling_params.py
tests/unittest/llmapi/test_sampling_params.py

Signed-off-by: 許元豪 <146086744+edenfunf@users.noreply.github.com>

github-actions Bot assigned edenfunf Apr 18, 2026

edenfunf changed the title ~~[None][fix] handle list-typed eos_token_id in SamplingParams._setup~~ [#11625][fix] handle list-typed eos_token_id in SamplingParams._setup Apr 18, 2026

coderabbitai Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread tests/unittest/llmapi/test_sampling_params.py Outdated

[None][fix] update copyright year to 2026 in test_sampling_params.py

d50c7b6

Signed-off-by: 許元豪 <146086744+edenfunf@users.noreply.github.com>

svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Apr 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#11625][fix] handle list-typed eos_token_id in SamplingParams._setup#13178

[#11625][fix] handle list-typed eos_token_id in SamplingParams._setup#13178
edenfunf wants to merge 2 commits intoNVIDIA:mainfrom
edenfunf:fix/sampling-params-list-eos-token-id

edenfunf commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

edenfunf commented Apr 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Why is this PR open if issue #11625 is closed?

How

Test

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

edenfunf commented Apr 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 18, 2026 •

edited

Loading